{\rtf1\ansi\uc1\deff0\deflang1024
{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}
{\f1\fnil\fcharset0 Arial;}
{\f2\fnil\fcharset0 Arial;}
{\f3\fnil\fcharset0 Courier New;}
{\f4\fnil\fcharset0 Zapf Chancery;}
{\f5\fnil\fcharset0 STIXGeneral;}
}
{\colortbl;
\red0\green0\blue0;
\red0\green0\blue255;
\red0\green255\blue255;
\red0\green255\blue0;
\red255\green0\blue255;
\red255\green0\blue0;
\red255\green255\blue0;
\red255\green255\blue255;
\red0\green0\blue128;
\red0\green128\blue128;
\red0\green128\blue0;
\red128\green0\blue128;
\red128\green0\blue0;
\red128\green128\blue0;
\red128\green128\blue128;
\red192\green192\blue192;
\red239\green219\blue197;
\red205\green149\blue117;
\red253\green217\blue181;
\red120\green219\blue226;
\red135\green169\blue107;
\red255\green164\blue116;
\red250\green231\blue181;
\red159\green129\blue112;
\red253\green124\blue110;
\red35\green35\blue35;
\red31\green117\blue254;
\red173\green173\blue214;
\red25\green158\blue189;
\red115\green102\blue189;
\red222\green93\blue131;
\red203\green65\blue84;
\red180\green103\blue77;
\red255\green127\blue73;
\red234\green126\blue93;
\red176\green183\blue198;
\red255\green255\blue153;
\red28\green211\blue162;
\red255\green170\blue204;
\red221\green68\blue146;
\red29\green172\blue214;
\red188\green93\blue88;
\red221\green148\blue117;
\red154\green206\blue235;
\red255\green188\blue217;
\red253\green219\blue109;
\red43\green108\blue196;
\red239\green205\blue184;
\red110\green81\blue96;
\red29\green249\blue20;
\red113\green188\blue120;
\red109\green174\blue129;
\red195\green100\blue197;
\red204\green102\blue102;
\red231\green198\blue151;
\red252\green217\blue117;
\red168\green228\blue160;
\red149\green145\blue140;
\red28\green172\blue120;
\red240\green232\blue145;
\red255\green29\blue206;
\red178\green236\blue93;
\red93\green118\blue203;
\red202\green55\blue103;
\red59\green176\blue143;
\red253\green252\blue116;
\red252\green180\blue213;
\red255\green189\blue136;
\red246\green100\blue175;
\red205\green74\blue74;
\red151\green154\blue170;
\red255\green130\blue67;
\red200\green56\blue90;
\red239\green152\blue170;
\red253\green188\blue180;
\red26\green72\blue118;
\red48\green186\blue143;
\red25\green116\blue210;
\red255\green163\blue67;
\red186\green184\blue108;
\red255\green117\blue56;
\red230\green168\blue215;
\red65\green74\blue76;
\red255\green110\blue74;
\red28\green169\blue201;
\red255\green207\blue171;
\red197\green208\blue230;
\red253\green215\blue228;
\red21\green128\blue120;
\red252\green116\blue253;
\red247\green128\blue161;
\red142\green69\blue133;
\red116\green66\blue200;
\red157\green129\blue186;
\red255\green29\blue206;
\red255\green73\blue108;
\red214\green138\blue89;
\red255\green72\blue208;
\red227\green37\blue107;
\red238\green32\blue77;
\red255\green83\blue73;
\red192\green68\blue143;
\red31\green206\blue203;
\red120\green81\blue169;
\red255\green155\blue170;
\red252\green40\blue71;
\red118\green255\blue122;
\red159\green226\blue191;
\red165\green105\blue79;
\red138\green121\blue93;
\red69\green206\blue162;
\red251\green126\blue253;
\red205\green197\blue194;
\red128\green218\blue235;
\red236\green234\blue190;
\red255\green207\blue72;
\red253\green94\blue83;
\red250\green167\blue108;
\red252\green137\blue172;
\red219\green215\blue210;
\red23\green128\blue109;
\red222\green170\blue136;
\red119\green221\blue231;
\red253\green252\blue116;
\red146\green110\blue174;
\red247\green83\blue148;
\red255\green160\blue137;
\red143\green80\blue157;
\red237\green237\blue237;
\red162\green173\blue208;
\red255\green67\blue164;
\red252\green108\blue133;
\red205\green164\blue222;
\red252\green232\blue131;
\red197\green227\blue132;
\red255\green182\blue83;
}
{\stylesheet
{\s0\qj\widctlpar\f0\fs22 \snext0 Normal;}
{\cs10 \additive\ssemihidden Default Paragraph Font;}
{\s1\qc\sb240\sa120\keepn\f0\b\fs40 \sbasedon0\snext0 Part;}
{\s2\ql\sb240\sa120\keepn\f0\b\fs40 \sbasedon0\snext0 heading 1;}
{\s3\ql\sb240\sa120\keepn\f0\b\fs32 \sbasedon0\snext0 heading 2;}
{\s4\ql\sb240\sa120\keepn\f0\b\fs32 \sbasedon0\snext0 heading 3;}
{\s5\ql\sb240\sa120\keepn\f0\b\fs24 \sbasedon0\snext0 heading 4;}
{\s6\ql\sb240\sa120\keepn\f0\b\fs24 \sbasedon0\snext0 heading 5;}
{\s7\ql\sb240\sa120\keepn\f0\b\fs24 \sbasedon0\snext0 heading 6;}
{\s8\qr\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext8 rightpar;}
{\s9\qc\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext9 centerpar;}
{\s10\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext10 leftpar;}
{\s11\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 equation;}
{\s12\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 equationNum;}
{\s13\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 equationAlign;}
{\s14\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 equationAlignNum;}
{\s15\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 equationArray;}
{\s16\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 equationArrayNum;}
{\s17\ql\sb120\sa120\keep\widctlpar\f0\fs20 \sbasedon0\snext0 theorem;}
{\s18\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 bitmapCenter;}
{\s20\qc\sb240\sa240\b\f0\fs36 \sbasedon0\snext21 Title;}
{\s21\qc\sa120\f0\fs22 \sbasedon0\snext0 author;}
{\s22\ql\tqc\tx4536\tqr\tx9072\f0\fs20 \sbasedon0\snext22 footer;}
{\s23\ql\tqc\tx4536\tqr\tx9072\f0\fs20 \sbasedon0\snext23 header;}
{\s30\ql\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 caption;}
{\s31\qc\sb120\sa0\keep\widctlpar\f0\fs20 \sbasedon0\snext0 Figure;}
{\s32\qc\sb120\sa0\keep\widctlpar\f0\fs20 \sbasedon0\snext32 Table;}
{\s33\qc\sb120\sa0\keep\widctlpar\f0\fs20 \sbasedon0\snext33 Tabular;}
{\s34\qc\sb120\sa0\keep\widctlpar\f0\fs20 \sbasedon0\snext34 Tabbing;}
{\s35\qj\li1024\ri1024\fi340\widctlpar\f0\fs20 \sbasedon0\snext35 Quote;}
{\s38\ql\widctlpar\f3\fs22 \snext38 verbatim;}
{\s46\ql\fi-283\li283\lin283\sb0\sa120\widctlpar\tql\tx283\f0\fs20 \sbasedon0\snext46 List;}
{\s47\ql\fi-283\li283\lin283\sb0\sa120\widctlpar\tql\tx283\f0\fs20 \sbasedon0\snext47 List 1;}
{\s50\qc\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 latex picture;}
{\s51\qc\sb120\sa120\keep\widctlpar\f0 \sbasedon0\snext0 subfigure;}
{\s61\ql\sb240\sa120\keepn\f0\b\fs32 \sbasedon0\snext62 bibheading;}
{\s62\ql\fi-567\li567\sb0\sa0\f0\fs20 \sbasedon0\snext62 bibitem;}
{\s64\ql\fi-283\li283\lin283\sb0\sa120\widctlpar\tql\tx283\f0\fs20 \sbasedon0\snext64 endnotes;}
{\s65\ql\fi-113\li397\lin397\f0\fs22 \sbasedon0\snext65 footnote text;}
{\s66\qj\fi-170\li454\lin454\f0\fs22 \sbasedon0\snext66 endnote text;}
{\cs62\super \additive\sbasedon10 footnote reference;}
{\cs63\super \additive\sbasedon10 endnote reference;}
{\s67\ql\sb60\sa60\keepn\f0\fs22 \sbasedon0\snext67 acronym;}
{\s70\qc\sa120\b\f0\fs22 \sbasedon0\snext71 abstract title;}
{\s71\qj\li1024\ri1024\fi340\widctlpar\f0\fs22 \sbasedon0\snext0 abstract;}
{\s80\ql\sb240\sa120\keepn\f0\b\fs20 \sbasedon0\snext0 contents_heading;}
{\s81\ql\li425\tqr\tldot\tx8222\sb240\sa60\keepn\f0\fs22\b \sbasedon0\snext82 toc 1;}
{\s82\ql\li512\tqr\tldot\tx8222\sb60\sa60\keepn\f0\fs22 \sbasedon0\snext83 toc 2;}
{\s83\ql\li1024\tqr\tldot\tx8222\sb60\sa60\keepn\f0\fs22 \sbasedon0\snext84 toc 3;}
{\s84\ql\li1536\tqr\tldot\tx8222\sb60\sa60\keepn\f0\fs22 \sbasedon0\snext85 toc 4;}
{\s85\ql\li2048\tqr\tldot\tx8222\sb60\sa60\keepn\f0\fs22 \sbasedon0\snext86 toc 5;}
{\s86\ql\li2560\tqr\tldot\tx8222\sb60\sa60\keepn\f0\fs22 \sbasedon0\snext86 toc 6;}
}
{\info
{\title Original file was Thesis.tex}
{\doccomm Created using latex2rtf 2.3.3 r1230 (released Feb 26, 2013) on Fri Jun 14 13:06:13 2013
}
}
{\footer\pard\plain\f0\fs22\qc\chpgn\par}
\paperw11960\paperh16900\margl2500\margr2560\margt2520\margb1820\pgnstart0\widowctrl\qj\ftnbj\f0\aftnnar
{\pard\plain\ql\sb240\sa120\keepn\f0\b\fs40\sl240\slmult1 \fi0 \fs20 Chapter {\*\bkmkstart BMchapter_conclusions}1{\*\bkmkend BMchapter_conclusions}\par
\pard\plain\s2\ql\sb240\sa120\keepn\f0\b\fs40\sl240\slmult1 \sb240 \fi0 Conclusions\par
{\footer\pard\plain\f0\fs22\qc\chpgn\par}
{\header\pard\plain\tqc\tx3450\tqr\tx6900 Chapter 7. {\i Conclusions}\tab
\par}
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \sb240 \fi0 Throughout this thesis, we introduced a number of research questions. We will discuss the status of each with respect to the results that we obtained. Further, we discuss future research directions.\par
\pard\plain\s3\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb240 \fi0 1  Research Questions\par
\pard\plain\s4\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb180 \fi0 1.1  Towards Self-Linking Linked Data\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \sb60 \fi0 The vision of a Self-Linking Linked Data introduced in Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_vision \\* MERGEFORMAT }}{\fldrslt{2}}} is feasible. The components described in this thesis could be immediately deployed as a beta version of the self-linking behavior. However, we acknowledge that many research questions that have to be answered before we have the proposed self-linking behavior implemented in the Linked Data.\par
\pard\plain\s4\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb120 \fi0 1.2  SERIMI: Class-based Matching for Instance Matching Across Heterogeneous Datasets\par
{\i \pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \sb60 \fi0 \scaps0\i How can we obtain correct matches for a set of source instances when there is no overlapping between the source and target schemas?  (Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_serimi \\* MERGEFORMAT }}{\fldrslt{3}}})} \par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 Firstly, it is important to mention that the cases where there is no overlapping between schemas occur in the real-world matching tasks. In Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_vision \\* MERGEFORMAT }}{\fldrslt{2}}}, we discussed a scenario where it occurs, and we showed the proposed method can satisfactorily solve the instance matching problem in that case. Particularly in this scenario, there were no properties in the source data beside the label of the entities (band\rquote s names); consequently, no overlapping of properties (in the schema) could exist to the target dataset (MusicBrainz). As well, we observed little schema overlap between the source and target datasets on the OAEI 2011 benchmark. Mainly due to the New York Times collection. Specifically in the person and organization datasets of this collection, only a label and a type property overlapped to the target schemas. This shows that in the reference benchmark in the field, the lack of schema overlap also exists, and it is not an isolated case. These cases help to enforce that the problem that we tackled in Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_serimi \\* MERGEFORMAT }}{\fldrslt{3}}} is indeed a relevant problem.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 We have shown that it is possible to match instances when the schemas do not overlap, by applying class-based matching. We observed that this method is more effective when there are a lot of instances in the target dataset that share the same or similar label, such as we observed in Geonames and DBPedia. In those cases, and when there is not schema overlap, the class-based matching is the best solution and also the only one that we are aware of it. \par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 The results lead us to conclude that class-based matching should be combined to direct matching to obtain the best instance matching performance (accuracy), in average. We acknowledge that the direct matching indeed perform better than class-based matching when there is enough schema overlap in the data, i.e., when the predicates that overlap can identify the correct target matching instance for a source instance. Concluding, class-based matching and direct matching complement each other because there is no method that will perform optimally in all matching tasks. In the future, these two approaches should be combined to other approaches to cover a larger set of matching tasks.\par
\pard\plain\s4\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb120 \fi0 1.3  Efficient and Effective On-the-fly Candidate Selection over Sparql Endpoints\par
{\i \pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \sb60 \fi0 \scaps0\i How can we obtain candidate matches for a set of source instance in an effective and time efficiency way, by querying a target remote endpoint?  (Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_sonda \\* MERGEFORMAT }}{\fldrslt{4}}})}\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 Firstly, we would like to mention that the querying solution that we propose are in average 10 times faster than the first solution that we considered to this problem in the beginning of our research. This can be observed when comparing the two implementations of SERIMI at GitHub, one using the discussed method{\cs62\super\chftn}
{\*\footnote\pard \s65\ql\fi-113\li397\lin397\f0\fs22{\cs62\super\chftn} https://github.com/samuraraujo/SondaSerimi}
 and the other a naive querying approach{\cs62\super\chftn}
{\*\footnote\pard \s65\ql\fi-113\li397\lin397\f0\fs22{\cs62\super\chftn} https://github.com/samuraraujo/SERIMI-RDF-Interlinking}
. The gain in efficiency compared to the alternative approaches discussed in Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_sonda \\* MERGEFORMAT }}{\fldrslt{4}}} is not as big as the gain observed in these two implementations, but it is still considerable. The results indicate that indeed is possible to obtain candidate matches via querying. As we showed, this strategy is more efficient than download the data and process it locally for all cases proposed in the OAEI benchmark. We have tested this approach in many other real-world scenarios and it produced comparable results to the one that we discussed in this thesis. We are convinced that this strategy is a good approach to instance matching on the Linked Data, specially when we considered integration of third-party data to DBPedia and Geonames, two evident hubs in this network [{Auer et\~al.}, 2007-{Kobilarov et\~al.}, 2009]. We think so due to all benefits that this method brings and specially because it supports the vision of the Self-Linking Linked Data. We envision that in the next years instance-matching via querying will gain popularity in the Linked Data.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 We acknowledge that there is a long path of research in this field. For instance, query engines could be optimize to answer matching queries, by using specific query operator (e.g. like) and tuning the index structure to this end. Another direction to explore is to use the algorithm design in Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_string \\* MERGEFORMAT }}{\fldrslt{5}}} to learning the correct formatting of string in the target dataset. This would allow to build exact queries opposed to approximated queries. Exact queries are more precise and more efficient to compute as we observed in our experiment. However, a deep investigation of this problem is necessary to draw an a convincing conclusion. We expected that this work pave a new research direction in the field.\par
\pard\plain\s4\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb120 \fi0 1.4  Learning Edit-Distance Based String Transformation Rules From Examples\par
{\i \pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \sb60 \fi0 \scaps0\i How can we learn string transformation rules from a limited set of examples that can correctly transform a large amount of unseen strings similar to the examples?  (Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_string \\* MERGEFORMAT }}{\fldrslt{5}}})}\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 The results obtained in Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_string \\* MERGEFORMAT }}{\fldrslt{5}}} are promising. We have shown that the rule learner algorithm can learn rules with high coverage from a unique example. Partially, this good results are due to the distribution of the data in the datasets that we evaluated; and partially the good results are due to the novel algorithm that we proposed. This method works so well because most of the human readable strings, or strings produced by human language, are quite homogeneous. No algorithm can transform data if there is no regularity or pattern between the data and the given example transformations used to learn the rules. We argue that it will be hard to improve the current coverage of the rule leaner, without compromising its efficiency. In contrary, the rule selector still can be improved using state-of-the-art machine learning techniques to select a specific rule from the set of learned ones. \par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 Concluding, the proposed method can learn string transformation rules from a limited set of examples that can correctly transform a large amount of unseen strings similar to the examples, assuming that the examples are representatives.\par
\pard\plain\s4\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb120 \fi0 1.5  Exercises on Knowledge Based Acceleration\par
{\i \pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \sb60 \fi0 \scaps0\i How can we design a model of centrality for news documents?  (Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_trec \\* MERGEFORMAT }}{\fldrslt{6}}})}\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 The study done in this work indicates that the model of centrality is very subjective to the explicit decisions done by the human annotators. The use of information retrieval techniques or semantics on the modeling of centrality produces marginal improvement to an elementary baseline because it does not capture the decisions of the human annotators. We argue that centrality is a concept that needs a clear definition so it can be reproduced by computation.\par
\pard\plain\s3\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb240 \fi0 {\*\bkmkstart BMBibliography}2{\*\bkmkend BMBibliography}  Future Research\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \sb60 \fi0 In this thesis, we studied a variety of topics mostly focused on instance matching of heterogeneous and distributed data. Within each research theme, there exist several perspectives that are not fully addressed or open issues that are worth further investigation.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 We start with Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_vision \\* MERGEFORMAT }}{\fldrslt{2}}}. The visionary architecture we proposed requires further research on many areas. Research on federated querying aiming to improve the discovery of datasets by the self-linking engines is necessary. Also, research on self-linking policies is a brand new area to be explored. Most importantly research on unsupervised approaches for instance matching is an area that should be reviewed when considering its application on Organic Linked Data. Apart from that, research on indexing should be conducted aiming to support the self-linking behavior. For example, as matching queries are quite specific, an index structure could be designed to support a quicker evaluation of these types of queries. Finally, research on SPARQL language and protocol should investigate new primitives to support signaling between datasets in the Linked Data specifically focusing on the self-linking behavior.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 In Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_serimi \\* MERGEFORMAT }}{\fldrslt{3}}}, our results on combining class-based matching and direct matching are preliminary. The results that we obtained rise a new research question: {\i How can we select the best matching strategy for a matching task? } The benefit of selecting the most appropriate matching strategy is to have gain in efficiency, avoiding unnecessary computation. Efficiency is another area of study, given that at a large scale may make sense to consider pruning strategies. For example, avoiding computing scores for candidate instances that are identified as false-positive, early in the process.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 In Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_sonda \\* MERGEFORMAT }}{\fldrslt{4}}}, we concluded that exact queries are faster than other queries but do not work in all cases, mainly due to variations on the string formatting between the source and target instances. This rises a new question: {\i How could we formulate exact instance matching queries? }. The challenge is two-fold. First, find a function that transform a source string (instance label) to the correct target format is problematic. As data are heterogeneous many formats may exist; consequently, how to select the correct one is an issue to be investigated. Second, there is no guarantee that the time to learn the transformation functions plus the time to execute the exact queries will be faster than the time to execute the method proposed in Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_sonda \\* MERGEFORMAT }}{\fldrslt{4}}}. The efficiency of this new approach would need further investigation. Due to the expected improvements that it may bring, this is definitely an interesting direction to be explored.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 In Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_string \\* MERGEFORMAT }}{\fldrslt{5}}}, we obtained promising results using the proposed algorithm. Considering that this algorithm has a large range of applications, the identification of a rule selector method that performs optimally to a specific data collection is an interesting research area.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 In Chapter {\field{\*\fldinst{\lang1024 REF BMchapter_trec \\* MERGEFORMAT }}{\fldrslt{6}}}, we studied the particular problem proposed by TREC-KBA 2012. The research area of building a model of centrality for a new stream is still in its childhood. It would be interesting to investigate by interviewing human annotators how they define centrality. Further investigation of a centrality model based on these reported aspects would be of great interest for this community.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 Overall, the results in this thesis merit further investigation on how to adapted the proposed components into the Linked Data. Also, our findings prompt the question whether instance matching has a different nature on the Linked Data settings. Our studies and results indicates that it does, which might stimulate further research into the purpose of instance matching for the Semantic Web.\par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 This thesis sheds more light on instance matching on heterogeneous and distributed data, broadens its relevance in the Linked Data and gives a range of topics for future research. \par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 {toc}{} \par
\pard\plain\s0\qj\widctlpar\f0\fs22\sl240\slmult1 \fi340 {toc}{} \par
{\footer\pard\plain\f0\fs22\qc\chpgn\par}
{\header\pard\plain\tqc\tx3450\tqr\tx6900 {\i Bibliography}\tab
\par}
{\pard\plain\s61\ql\sb240\sa120\keepn\f0\b\fs32\sl240\slmult1 \sb120 \fi0 {\plain\b\fs32 References}\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \sb60 \li450\fi0 Carina\~F. Dorneles, Rodrigo Gon\'e7alves, and Ronaldo dos Santos\~Mello. Approximate data instance matching: a survey. {\i Knowl. Inf. Syst.}, 27 (1): 1\endash 21, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Ahmed\~K. Elmagarmid, Panagiotis\~G. Ipeirotis, and Vassilios\~S. Verykios. Duplicate record detection: A survey. {\i IEEE Trans. Knowl. Data Eng.}, 19 (1): 1\endash 16, 2007.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Pavel Shvaiko and J{\'e9}r{\'f4}me Euzenat. A survey of schema-based matching approaches. In {\i J. Data Semantics IV}, pages 146\endash 171. 2005.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Tim Berners-Lee, James Hendler, and Ora Lassila. {The Semantic Web: Scientific American}. {\i Scientific American}, May 2001. URL {\f3 http://www.sciam.com/article.cfm? articleID=00048144-10D2-1C70-84A9809EC588EF21&#38;pageNumber=1&#38;catID=2}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data - the story so far. {\i Int. J. Semantic Web Inf. Syst.}, 5 (3): 1\endash 22, 2009{ a}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Peter Christen. {\i Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection}. Data-centric systems and applications. Springer, 2012. ISBN 978-3-642-31163-5.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Hanna K{\'f6}pcke and Erhard Rahm. Frameworks for entity matching: A comparison. {\i Data Knowl. Eng.}, 69 (2): 197\endash 210, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Erhard Rahm and Philip\~A. Bernstein. A survey of approaches to automatic schema matching. {\i VLDB J.}, 10 (4): 334\endash 350, 2001.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Natalya\~Fridman Noy. Semantic integration: A survey of ontology-based approaches. {\i SIGMOD Record}, 33 (4): 65\endash 70, 2004.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 AnHai Doan and Alon\~Y. Halevy. Semantic integration research in the database community: A brief survey. {\i AI Magazine}, 26 (1): 83\endash 94, 2005.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Nikolaos Konstantinou, Dimitrios-Emmanuel Spanos, and Nikolas Mitrou. Ontology and database mapping: A survey of current implementations and future directions. {\i J. Web Eng.}, 7 (1): 1\endash 24, 2008.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Pavel Shvaiko and J{\'e9}r{\'f4}me Euzenat. Ontology matching: State of the art and future challenges. {\i IEEE Trans. Knowl. Data Eng.}, 25 (1): 158\endash 176, 2013.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Andr{\'e9} Freitas, Edward Curry, Jo{\'e3}o\~Gabriel Oliveira, and Se{\'e1}n O\rquote Riain. Querying heterogeneous datasets on the linked data web: Challenges, approaches, and trends. {\i IEEE Internet Computing}, 16 (1): 24\endash 33, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Tim Berners-Lee, J.\~Hollenbach, Kanghao Lu, J.\~Presbrey, Eric Prud\rquote hommeaux, and Monica M.\~C. Schraefel. Tabulator redux: Browsing and writing linked data. In {\i LDOW}, 2008.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Mathieu d\rquote Aquin, Marta Sabou, Enrico Motta, Sofia Angeletou, Laurian Gridinoc, Vanessa Lopez, and Fouad Zablith. What can be done with the semantic web?  an overview watson-based applications. In {\i SWAP}, 2008.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Samur Ara{\'fa}jo and Daniel Schwabe. Explorator: A tool for exploring rdf data through direct manipulation. In {\i LDOW}, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Aidan Hogan, Andreas Harth, J{\'fc}rgen Umbrich, Sheila Kinsella, Axel Polleres, and Stefan Decker. Searching and browsing linked data with swse: The semantic web search engine. {\i J. Web Sem.}, 9 (4): 365\endash 401, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Harry Halpin, Patrick\~J. Hayes, James\~P. McCusker, Deborah\~L. McGuinness, and Henry\~S. Thompson. When owl: sameas isn\rquote t the same: An analysis of identity in linked data. In {\i International Semantic Web Conference (1)}, pages 305\endash 320, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Bo\~Hu and Glenn Svensson. A case study of linked enterprise data. In {\i International Semantic Web Conference (2)}, pages 129\endash 144, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Ian Millard, Hugh Glaser, Manuel Salvadores, and Nigel Shadbolt. Consuming multiple linked data sources: Challenges and experiences. In {\i COLD}, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Aidan Hogan. Integrating linked data through rdfs and owl: Some lessons learnt. In {\i RR}, pages 250\endash 256, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Prateek Jain, Pascal Hitzler, Peter\~Z. Yeh, Kunal Verma, and Amit\~P. Sheth. Linked data is merely more data. In {\i AAAI Spring Symposium: Linked Data Meets Artificial Intelligence}, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Aidan Hogan, J{\'fc}rgen Umbrich, Andreas Harth, Richard Cyganiak, Axel Polleres, and Stefan Decker. An empirical survey of linked data conformance. {\i J. Web Sem.}, 14: 14\endash 44, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Giovanni Tummarello and Renaud Delbru. Publishing data that links itself: A conjecture. In {\i AAAI Spring Symposium: Linked Data Meets Artificial Intelligence}, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Samur Ara{\'fa}jo, Duc Tran, Arjen DeVries, Jan Hidders, and Daniel Schwabe. Serimi: Class-based disambiguation for effective instance matching over heterogeneous web data. In {\i WebDB}, pages 25\endash 30, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 J{\'e9}r{\'f4}me Euzenat, Alfio Ferrara, Willem\~Robert van Hage, Laura Hollink, Christian Meilicke, Andriy Nikolov, Dominique Ritze, Fran\'e7ois Scharffe, Pavel Shvaiko, Heiner Stuckenschmidt, Ondrej Sv{\'e1}b-Zamazal, and C{\'e1}ssia\~Trojahn dos Santos. Results of the ontology alignment evaluation initiative 2011. In {\i OM}, 2011{ a}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Matthew Michelson and Craig\~A. Knoblock. Learning blocking schemes for record linkage. In {\i AAAI}, pages 440\endash 445, 2006.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Robert Isele, Anja Jentzsch, and Christian Bizer. Efficient multidimensional blocking for link discovery without losing recall. In {\i WebDB}, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 George Papadakis, Ekaterini Ioannou, Claudia Nieder{\'e9}e, Themis Palpanas, and Wolfgang Nejdl. Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data. In {\i WSDM}, pages 53\endash 62, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Ivan\~P. Fellegi and Alan\~B. Sunter. A theory for record linkage. {\i Journal of the American Statistical Association}, 64 (328): pp. 1183\endash 1210, 1969. ISSN 01621459. URL {\f3 http://www.jstor.org/stable/2286061}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Olaf Hartig, Christian Bizer, and Johann\~Christoph Freytag. Executing sparql queries over the web of linked data. In {\i International Semantic Web Conference}, pages 293\endash 309, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Olaf G{\'f6}rlitz and Steffen Staab. Federated data management and query optimization for linked open data. In {\i New Directions in Web Data Management 1}, pages 109\endash 137. 2011{ a}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. Fedx: Optimization techniques for federated query processing on linked data. In {\i International Semantic Web Conference (1)}, pages 601\endash 616, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Wei Hu, Yuzhong Qu, and Xingzhi Sun. Bootstrapping object coreferencing on the semantic web. {\i J. Comput. Sci. Technol.}, 26 (4): 663\endash 675, 2011{ a}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Andriy Nikolov, Victoria\~S. Uren, Enrico Motta, and Anne N.\~De Roeck. Refining instance coreferencing results using belief propagation. In {\i ASWC}, pages 405\endash 419, 2008.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Fabian\~M. Suchanek, Serge Abiteboul, and Pierre Senellart. Paris: Probabilistic alignment of relations, instances, and schema. {\i PVLDB}, 5 (3): 157\endash 168, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Dezhao Song and Jeff Heflin. Automatically generating data linkages using a domain-independent candidate selection approach. In {\i ISWC}, pages 649\endash 664, 2011{ a}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Shu Rong, Xing Niu, Evan\~Wei Xiang, Haofen Wang, Qiang Yang, and Yong Yu. A machine learning approach for instance matching based on similarity metrics. In {\i International Semantic Web Conference (1)}, pages 460\endash 475, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Andreas Schultz, Andrea Matteini, Robert Isele, Christian Bizer, and Christian Becker. Ldif - linked data integration framework. In {\i COLD}, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Robert Isele and Christian Bizer. Learning expressive linkage rules using genetic programming. {\i PVLDB}, 5 (11): 1638\endash 1649, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao. Describing linked datasets. In {\i LDOW}, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 J{\'e9}r{\'f4}me Euzenat and Petko Valtchev. Similarity-based ontology alignment in owl-lite. In {\i ECAI}, pages 333\endash 337, 2004.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Luiz Andr{\'e9} P.\~Paes Leme, Marco\~A. Casanova, Karin\~Koogan Breitman, and Antonio\~L. Furtado. Instance-based owl schema matching. In {\i ICEIS}, pages 14\endash 26, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Xing Niu, Shu Rong, Yunlong Zhang, and Haofen Wang. Zhishi.links results for oaei 2011. In {\i OM}, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 {J\u195?\u169?r\u195?\u180?me} {Euzenat}, {Alfio} {Ferrara}, {Christian} {Meilicke}, {Andriy} {Nikolov}, {Juan} {Pane}, {Fran\u195?\u167?ois} {Scharffe}, {Pavel} {Shvaiko}, {Heiner} {Stuckenschmidt}, {Ondrej} {Sv\u195?\u161?b-Zamazal}, {Vojtech} {Sv\u195?\u161?tek}, and {C\u195?\u161?ssia} {Trojahn dos Santos}. Results of the ontology alignment evaluation initiative 2010. In {Pavel} {Shvaiko}, {J\u195?\u169?r\u195?\u180?me} {Euzenat}, {Fausto} {Giunchiglia}, {Heiner} {Stuckenschmidt}, {Ming} {Mao}, and {Isabel} {Cruz}, editors, {\i Proc. 5th ISWC workshop on ontology matching (OM), Shanghai (CN)}, pages 85\endash 117, 2010. URL {\f3 http://oaei.ontologymatching.org/2010/results/oaei2010.pdf}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Jiannan Wang, Guoliang Li, Jeffrey\~Xu Yu, and Jianhua Feng. Entity matching: How similar is similar. {\i PVLDB}, 4 (10): 622\endash 633, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Mauricio\~A. Hern{\'e1}ndez and Salvatore\~J. Stolfo. The merge/purge problem for large databases. pages 127\endash 138, 1995.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Andrew McCallum, Kamal Nigam, and Lyle\~H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In {\i KDD}, pages 169\endash 178, 2000.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 George Papadakis and Wolfgang Nejdl. Efficient entity resolution methods for heterogeneous information spaces. In {\i ICDE Workshops}, pages 304\endash 307, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Dezhao Song and Jeff Heflin. Automatically generating data linkages using a domain-independent candidate selection approach. In {\i International Semantic Web Conference (1)}, pages 649\endash 664, 2011{ b}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Yongtao Ma and Thanh Tran. Typifier: Inferring the type semantics of structured data. In {\i ICDE}, 2013.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Amos Tversky. Features of similarity. {\i Psychological Review}, 84 (4): 327\endash 352, July 1977. URL {\f3 http://dx.doi.org/10.1037/0033-295X.84.4.327}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Surajit Chaudhuri, Bee-Chung Chen, Venkatesh Ganti, and Raghav Kaushik. Example-driven design of efficient record matching queries. In {\i VLDB}, pages 327\endash 338, 2007.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 William Chauvenet. {\i A Manual of Spherical and Practical Astronomy V.II}. Dover, 1960.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Wei Hu, Jianfeng Chen, and Yuzhong Qu. A self-training approach for resolving object coreference on the semantic web. In {\i WWW}, pages 87\endash 96, 2011{ b}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Sergey Melnik, Hector Garcia-Molina, and Erhard Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In {\i ICDE}, pages 117\endash 128, 2002.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Julius Volz, Christian Bizer, Martin Gaedke, and Georgi Kobilarov. Discovering and maintaining links on the web of data. In {\i International Semantic Web Conference}, pages 650\endash 665, 2009{ a}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 William\~W. Cohen, Pradeep\~D. Ravikumar, and Stephen\~E. Fienberg. A comparison of string distance metrics for name-matching tasks. In {\i IIWeb}, pages 73\endash 78, 2003{ a}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Alexander Budanitsky and Graeme Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. {\i Computational Linguistics}, 32 (1): 13\endash 47, 2006.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Xianpei Han and Jun Zhao. Structural semantic relatedness: A knowledge-based method to named entity disambiguation. In {\i ACL}, pages 50\endash 59, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 W.\~Cohen, P.\~Ravikumar, and S.\~Fienberg. {\i A comparison of string metrics for matching names and records}. August 2003{ b}. URL {\f3 http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.5.9007}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Karl Branting. A comparative evaluation of name-matching algorithms. In {\i ICAIL}, pages 224\endash 232, 2003.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Andriy Nikolov, Mathieu d\rquote Aquin, and Enrico Motta. Unsupervised learning of link discovery configuration. In {\i ESWC}, pages 119\endash 133, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Zhichun Wang, Xiao Zhang, Lei Hou, Yue Zhao, Juanzi Li, Yu\~Qi, and Jie Tang. Rimom results for oaei 2010. In {\i OM}, 2010.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Christoph B{\'f6}hm, Gerard de\~Melo, Felix Naumann, and Gerhard Weikum. Linda: distributed web-of-data-scale entity matching. In {\i CIKM}, pages 2104\endash 2108, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Afraz Jaffri, Hugh Glaser, and Ian Millard. Uri disambiguation in the context of linked data. In {\i LDOW}, 2008.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Julius Volz, Christian Bizer, Martin Gaedke, and Georgi Kobilarov. Silk - a link discovery framework for the web of data. In {\i LDOW}, 2009{ b}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Alfio Ferrara, Andriy Nikolov, and Fran\'e7ois Scharffe. Data linking for the semantic web. {\i Int. J. Semantic Web Inf. Syst.}, 7 (3): 46\endash 76, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Philippe Cudr{\'e9}-Mauroux, Parisa Haghani, Michael Jost, Karl Aberer, and Hermann de\~Meer. idmesh: graph-based disambiguation of linked data. In {\i WWW}, pages 591\endash 600, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Avigdor Gal. {\i Uncertain Schema Matching}. Synthesis Lectures on Data Management. Morgan {&} Claypool Publishers, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Bijan Parsia. Querying the web with sparql. In {\i Reasoning Web}, pages 53\endash 67, 2006.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Cristian\~P{\'e9}rez de\~Laborda and Stefan Conrad. Bringing relational data into the semanticweb using sparql and relational.owl. In {\i ICDE Workshops}, page\~55, 2006.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Sebastian Dietzold and S{\'f6}ren Auer. Integrating sparql endpoints into directory services. In {\i SFSW}, 2007.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Walter Corno, Francesco Corcoglioniti, Irene Celino, and Emanuele\~Della Valle. Exposing heterogeneous data sources as sparql endpoints through an object-oriented abstraction. In {\i ASWC}, pages 434\endash 448, 2008.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Olaf G{\'f6}rlitz and Steffen Staab. Splendid: Sparql endpoint federation exploiting void descriptions. In {\i COLD}, 2011{ b}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Gabriela Montoya, Maria-Esther Vidal, {\'d3}scar Corcho, Edna Ruckhaus, and Carlos\~Buil Aranda. Benchmarking federated sparql query engines: Are existing testbeds enough?  In {\i International Semantic Web Conference (2)}, pages 313\endash 324, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 George Papadakis, Ekaterini Ioannou, Claudia Nieder{\'e9}e, and Peter Fankhauser. Efficient entity resolution for large heterogeneous information spaces. In {\i WSDM}, pages 535\endash 544, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Stuart\~J. Russell and Peter Norvig. {\i Artificial Intelligence - A Modern Approach (3. internat. ed.)}. Pearson Education, 2010. ISBN 978-0-13-207148-2.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Jorge P{\'e9}rez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexity of sparql. {\i ACM Trans. Database Syst.}, 34 (3), 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Robert\~D. Carr, Srinivas Doddi, Goran Konjevod, and Madhav\~V. Marathe. On the red-blue set cover problem. In {\i SODA}, pages 345\endash 353, 2000.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Ian\~H. Witten and Eibe Frank. {\i Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations}. Morgan Kaufmann, 1999. ISBN 1-55860-552-5.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Andrew\~Y. Ng and Michael\~I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In {\i NIPS}, pages 841\endash 848, 2001.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 David\~J. Hand and Keming Yu. Idiot\rquote s bayes\u226?\u8364?\u8221?not so stupid after all?  {\i International Statistical Review}, 69 (3): 385\endash 398, 2001. ISSN 1751-5823. {10.1111/j.1751-5823.2001.tb00465.x}. URL {\f3 http://dx.doi.org/10.1111/j.1751-5823.2001.tb00465.x}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 J{\'e9}r{\'f4}me Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Pavel Shvaiko, and C{\'e1}ssia\~Trojahn dos Santos. Ontology alignment evaluation initiative: Six years of experience. {\i J. Data Semantics}, 15: 158\endash 192, 2011{ b}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Juanzi Li, Jie Tang, Yi\~Li, and Qiong Luo. Rimom: A dynamic multistrategy ontology alignment framework. {\i IEEE Trans. Knowl. Data Eng.}, 21 (8): 1218\endash 1232, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Indrajit Bhattacharya and Lise Getoor. Query-time entity resolution. {\i J. Artif. Intell. Res. (JAIR)}, 30: 621\endash 657, 2007.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Ahmed Metwally and Christos Faloutsos. V-smart-join: A scalable mapreduce framework for all-pair similarity joins of multisets and vectors. {\i PVLDB}, 5 (8): 704\endash 715, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Daniel\~M. Herzig and Thanh Tran. Heterogeneous web data search using relevance-based on the fly data integration. In {\i WWW}, pages 141\endash 150, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Usama\~M. Fayyad. Data mining and knowledge discovery: Making sense out of data. {\i IEEE Expert}, 11 (5): 20\endash 25, 1996.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Erhard Rahm and Hong\~Hai Do. Data cleaning: Problems and current approaches. {\i IEEE Data Eng. Bull.}, 23 (4): 3\endash 13, 2000.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik. Learning string transformations from examples. {\i PVLDB}, 2 (1): 514\endash 525, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Matthew Michelson and Craig\~A. Knoblock. Mining the heterogeneous transformations between data sources to aid record linkage. In {\i IC-AI}, pages 422\endash 428, 2009.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiadou, and Jun ichi Tsujii. A discriminative candidate generator for string transformations. In {\i EMNLP}, pages 447\endash 456, 2008.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Vladimir\~I. Levenshtein. {Binary codes capable of correcting deletions, insertions, and reversals}. Technical Report\~8, 1966.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Donald\~E. Knuth. {\i The Art of Computer Programming, Volume III: Sorting and Searching}. Addison-Wesley, 1973. ISBN 0-201-03803-X.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Owen\~L. Astrachan. Bubble sort: an archaeological algorithmic analysis. In {\i SIGCSE}, pages 1\endash 5, 2003.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Thomas\~H. Cormen, Charles\~E. Leiserson, Ronald\~L. Rivest, and Clifford Stein. {\i Introduction to Algorithms (3. ed.)}. MIT Press, 2009. ISBN 978-0-262-03384-8.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Dan Gusfield. {\i Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology}. Cambridge University Press, 1997. ISBN 0-521-58519-8.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Peter Linz. {\i An introduction to formal languages and automata (4. ed.)}. Jones and Bartlett Publishers, 2006. ISBN 978-0-7637-3798-6.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 L.C. Molina, L.\~Belanche, and A.\~Nebot. Feature selection algorithms: a survey and experimental evaluation. In {\i Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on}, pages 306\endash 313, 2002. {10.1109/ICDM.2002.1183917}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Chotirat\~(Ann) Ratanamahatana and Dimitrios Gunopulos. Feature selection for the naive bayesian classifier using decision trees. {\i Applied Artificial Intelligence}, 17 (5-6): 475\endash 487, 2003.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Cai-Nicolas Ziegler, Sean\~M. McNee, Joseph\~A. Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In {\i Proceedings of the 14th international conference on World Wide Web}, WWW \rquote 05, pages 22\endash 32, New York, NY, USA, 2005. ACM. ISBN 1-59593-046-9. {10.1145/1060745.1060754}. URL {\f3 http://doi.acm.org/10.1145/1060745.1060754}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Jie Cheng and Russell Greiner. Comparing bayesian network classifiers. In {\i UAI}, pages 101\endash 108, 1999.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Sunanda Patro and Wei Wang. Learning top-k transformation rules. In {\i DEXA (1)}, pages 172\endash 186, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Sheila Tejada, Craig\~A. Knoblock, and Steven Minton. Learning domain-independent string transformation weights for high accuracy object identification. In {\i KDD}, pages 350\endash 359, 2002.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Guillaume Bouchard and Bill Triggs. The trade-off between generative and discriminative classifiers. In {\i Proceedings in Computational Statistics, 16th Symposium of IASC}, pages 721\endash 728. Physica-Verlag, 2004.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In {\i POPL}, pages 317\endash 330, 2011.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Rishabh Singh and Sumit Gulwani. Learning semantic string transformations from examples. {\i PVLDB}, 5 (8): 740\endash 751, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Bo\~Wu, Pedro\~A. Szekely, and Craig\~A. Knoblock. Learning transformation rules by examples. In {\i AAAI}, 2012.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Giorgio Satta and John\~C. Henderson. String transformation learning. In Philip\~R. Cohen and Wolfgang Wahlster, editors, {\i ACL}, pages 444\endash 451. Morgan Kaufmann Publishers / ACL, 1997. URL {\f3 http://dblp.uni-trier.de/db/conf/acl/acl97.html#SattaH97}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Vijayshankar Raman and Joseph\~M. Hellerstein. Potter\rquote s wheel: An interactive data cleaning system. In {\i VLDB}, pages 381\endash 390, 2001.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 J.\~R. Frank, M.\~Kleiman-Weiner, D.\~A. Roberts, F.\~Niu, C.\~Zhang, C.\~Re, and I.\~Soboroff. {Building an Entity-Centric Stream Filtering Test Collection for TREC 2012}. In {\i Proceedings of the Text REtrieval Conference (TREC)}, 2012. URL {\f3 http://trec.nist.gov/act_part/conference/notebook.papers/KBA.OVERVIEW.pdf}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 S{\'f6}ren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary\~G. Ives. Dbpedia: A nucleus for a web of open data. In {\i ISWC/ASWC}, pages 722\endash 735, 2007.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Christian Bizer, Jens Lehmann, Georgi Kobilarov, S{\'f6}ren Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. Dbpedia - a crystallization point for the web of data. {\i J. Web Sem.}, 7 (3): 154\endash 165, 2009{ b}.\par
\pard\plain\s62\ql\fi-567\li567\sb0\sa0\f0\fs20\sl240\slmult1 \li450\fi0 Georgi Kobilarov, Tom Scott, Yves Raimond, Silver Oliver, Chris Sizemore, Michael Smethurst, Christian Bizer, and Robert Lee. Media meets semantic web - how the bbc uses dbpedia and linked data to make connections. In {\i ESWC}, pages 723\endash 737, 2009.\par
}}}
